ATAC-seq QC

The objective of this document is to evaluate the QC metrics of this dataset with respect to the ATAC-seq assay. These have already QCed relatively well by Archr, so this is going to be an analysis to confirm the datasets are high quality and that certain marker genes have expected phenotypes.

suppressPackageStartupMessages(library(Seurat))
suppressPackageStartupMessages(library(Signac))
suppressPackageStartupMessages(library(dplyr))
suppressPackageStartupMessages(library(ggplot2))
suppressPackageStartupMessages(library(patchwork))

Sequencing QC metrics

These are metrics we can calculate based on the sequencing alignments and our known gene annotations. Breifly they are:

Nucleosome signal - ratio of fragments greater than nucleosomal length (147) to less than nucleosomal length. Generally you want this to be low, since smaller fragments are TF bindings. TSS enrichment - how many reads come from TSS. Typically you want this to be higher as TSS should be enriched for acessible regions. Blacklist fraction - What portion of reads fall in sequencing regions that are overrepresented (i.e. artifact or high in low quality)

#ran this section interactively since it causes markdown issues. 

#nucleosomal signal, not looked at in archr  
#results<-NucleosomeSignal(results)
#Tss signaal, used a pretty permissive cutoff there, but most cells have been removed due to RNA cutoffs  
#results <- TSSEnrichment(results)

#we're skiping FRiP since we already know it's decent here both based on 10X QC but also ArchR
#blacklist
#results$blacklist_fraction <- FractionCountsInRegion( object = results, assay = 'ATAC', regions = blacklist_hg38)
results<-readRDS("~/gibbs/DOGMAMORPH/Ranalysis/Objects/20230613ATACQCObj.rds")


#no cluster or individual Fragment histogram differences
FragmentHistogram(results)
## Warning: Removed 69 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 32 rows containing missing values (`geom_bar()`).

FragmentHistogram(results, group.by = "orig.ident")
## Warning: Removed 69 rows containing non-finite values (`stat_bin()`).
## Warning: Removed 18 rows containing missing values (`geom_bar()`).

#cDC clearly bimodal
VlnPlot(results, "nCount_ATAC",pt.size = 0)

#PBMC lower (4,7,10)
VlnPlot(results, "nCount_ATAC",pt.size = 0, group.by = "orig.ident")

#similar observations for number of ATAC peaks obseved 
VlnPlot(results, "nFeature_ATAC",pt.size = 0)

VlnPlot(results, "nFeature_ATAC",pt.size = 0, group.by = "orig.ident")

#myeloid a little different
VlnPlot(results, "TSS.enrichment",pt.size = 0)

#JC4 a little lower
VlnPlot(results, "TSS.enrichment",pt.size = 0, group.by = "orig.ident")

#neigther are crazy, blacklist is basically null, which make sense since we filtered it in ArchR
VlnPlot(results, "blacklist_fraction",pt.size = 0)

VlnPlot(results, "blacklist_fraction",pt.size = 0, group.by = "orig.ident")

VlnPlot(results, "nucleosome_signal",pt.size = 0)

#library level heterogeneity, but generally values are low here, suggests we're fine
VlnPlot(results, "nucleosome_signal",pt.size = 0, group.by = "orig.ident")

Based on the above results, there appears to be differences in the QC between PBMC and CD4 libraries, and a larger number of fragments in certain myeloid populations, but overall the libraries look good as expected based on archr.

Marker gene inspection

The purpose here is to just check that we see cell specific differences for marker genes in chromatin accessibility. Below appears to show that we have good chromatin alignment with RNA based marker definition.

#looks in 

CoveragePlot(results,region = "BCL11B", features = "BCL11B", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "NKG7", features = "NKG7", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 25 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "CD3E", features = "CD3E", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "MS4A1", features = "MS4A1", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 22 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "LEF1", features = "LEF1", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 3 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "TREM1", features = "TREM1", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "LYZ", features = "LYZ", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 17 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "IL7R", features = "IL7R", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "CCR7", features = "CCR7", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "CD14", features = "CD14", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 27 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "CD8A", features = "CD8A", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 50 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "FCGR3A", features = "FCGR3A", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 5 rows containing missing values (`geom_segment()`).
## Warning: Removed 1 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "MS4A7", features = "MS4A7", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 23 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "GNLY", features = "GNLY", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "FCER1A", features = "FCER1A", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "CST3", features = "CST3", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "PPBP", features = "PPBP", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 2 rows containing missing values (`geom_segment()`).

#some TFs, low signal possibly because these are diluted 
CoveragePlot(results,region = "FOXP3", features = "FOXP3", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)
## Warning: Removed 21 rows containing missing values (`geom_segment()`).

CoveragePlot(results,region = "RORC", features = "RORC", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "GATA3", features = "GATA3", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

CoveragePlot(results,region = "TBX21", features = "TBX21", 
             expression.assay="RNA", 
             extend.upstream = 10000, extend.downstream = 10000)

devtools::session_info()
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
## - Session info ---------------------------------------------------------------
##  setting  value
##  version  R version 4.2.0 (2022-04-22)
##  os       Red Hat Enterprise Linux 8.8 (Ootpa)
##  system   x86_64, linux-gnu
##  ui       X11
##  language (EN)
##  collate  C
##  ctype    C
##  tz       Etc/UTC
##  date     2023-06-13
##  pandoc   3.1.1 @ /usr/lib/rstudio-server/bin/quarto/bin/tools/ (via rmarkdown)
## 
## - Packages -------------------------------------------------------------------
##  package          * version   date (UTC) lib source
##  abind              1.4-5     2016-07-21 [2] CRAN (R 4.2.0)
##  beeswarm           0.4.0     2021-06-01 [2] CRAN (R 4.2.0)
##  BiocGenerics       0.44.0    2022-11-01 [1] Bioconductor
##  BiocParallel       1.32.6    2023-03-17 [1] Bioconductor
##  Biostrings         2.66.0    2022-11-01 [1] Bioconductor
##  bitops             1.0-7     2021-04-24 [2] CRAN (R 4.2.0)
##  bslib              0.4.2     2022-12-16 [1] CRAN (R 4.2.0)
##  cachem             1.0.8     2023-05-01 [1] CRAN (R 4.2.0)
##  callr              3.7.3     2022-11-02 [1] CRAN (R 4.2.0)
##  cli                3.6.1     2023-03-23 [1] CRAN (R 4.2.0)
##  cluster            2.1.4     2022-08-22 [2] CRAN (R 4.2.0)
##  codetools          0.2-19    2023-02-01 [2] CRAN (R 4.2.0)
##  colorspace         2.1-0     2023-01-23 [2] CRAN (R 4.2.0)
##  cowplot            1.1.1     2020-12-30 [2] CRAN (R 4.2.0)
##  crayon             1.5.2     2022-09-29 [2] CRAN (R 4.2.0)
##  data.table         1.14.8    2023-02-17 [2] CRAN (R 4.2.0)
##  DBI                1.1.3     2022-06-18 [2] CRAN (R 4.2.0)
##  deldir             1.0-6     2021-10-23 [2] CRAN (R 4.2.0)
##  devtools           2.4.5     2022-10-11 [1] CRAN (R 4.2.0)
##  digest             0.6.31    2022-12-11 [2] CRAN (R 4.2.0)
##  dplyr            * 1.1.2     2023-04-20 [1] CRAN (R 4.2.0)
##  ellipsis           0.3.2     2021-04-29 [2] CRAN (R 4.2.0)
##  evaluate           0.20      2023-01-17 [2] CRAN (R 4.2.0)
##  fansi              1.0.4     2023-01-22 [2] CRAN (R 4.2.0)
##  farver             2.1.1     2022-07-06 [2] CRAN (R 4.2.0)
##  fastmap            1.1.1     2023-02-24 [1] CRAN (R 4.2.0)
##  fastmatch          1.1-3     2021-07-23 [2] CRAN (R 4.2.0)
##  fitdistrplus       1.1-8     2022-03-10 [2] CRAN (R 4.2.0)
##  fs                 1.6.1     2023-02-06 [2] CRAN (R 4.2.0)
##  future             1.32.0    2023-03-07 [1] CRAN (R 4.2.0)
##  future.apply       1.10.0    2022-11-05 [1] CRAN (R 4.2.0)
##  generics           0.1.3     2022-07-05 [2] CRAN (R 4.2.0)
##  GenomeInfoDb       1.34.9    2023-02-02 [1] Bioconductor
##  GenomeInfoDbData   1.2.9     2023-03-17 [1] Bioconductor
##  GenomicRanges      1.50.2    2022-12-16 [1] Bioconductor
##  ggbeeswarm         0.7.2     2023-04-29 [1] CRAN (R 4.2.0)
##  ggplot2          * 3.4.2     2023-04-03 [1] CRAN (R 4.2.0)
##  ggrastr            1.0.1     2021-12-08 [1] CRAN (R 4.2.0)
##  ggrepel            0.9.3     2023-02-03 [1] CRAN (R 4.2.0)
##  ggridges           0.5.4     2022-09-26 [1] CRAN (R 4.2.0)
##  globals            0.16.2    2022-11-21 [1] CRAN (R 4.2.0)
##  glue               1.6.2     2022-02-24 [2] CRAN (R 4.2.0)
##  goftest            1.2-3     2021-10-07 [2] CRAN (R 4.2.0)
##  gridExtra          2.3       2017-09-09 [2] CRAN (R 4.2.0)
##  gtable             0.3.3     2023-03-21 [1] CRAN (R 4.2.0)
##  highr              0.10      2022-12-22 [1] CRAN (R 4.2.0)
##  htmltools          0.5.5     2023-03-23 [1] CRAN (R 4.2.0)
##  htmlwidgets        1.6.2     2023-03-17 [1] CRAN (R 4.2.0)
##  httpuv             1.6.9     2023-02-14 [1] CRAN (R 4.2.0)
##  httr               1.4.5     2023-02-24 [1] CRAN (R 4.2.0)
##  ica                1.0-3     2022-07-08 [2] CRAN (R 4.2.0)
##  igraph             1.4.2     2023-04-07 [1] CRAN (R 4.2.0)
##  IRanges            2.32.0    2022-11-01 [1] Bioconductor
##  irlba              2.3.5.1   2022-10-03 [1] CRAN (R 4.2.0)
##  jquerylib          0.1.4     2021-04-26 [2] CRAN (R 4.2.0)
##  jsonlite           1.8.4     2022-12-06 [2] CRAN (R 4.2.0)
##  KernSmooth         2.23-20   2021-05-03 [2] CRAN (R 4.2.0)
##  knitr              1.42      2023-01-25 [1] CRAN (R 4.2.0)
##  labeling           0.4.2     2020-10-20 [2] CRAN (R 4.2.0)
##  later              1.3.0     2021-08-18 [2] CRAN (R 4.2.0)
##  lattice            0.21-8    2023-04-05 [1] CRAN (R 4.2.0)
##  lazyeval           0.2.2     2019-03-15 [2] CRAN (R 4.2.0)
##  leiden             0.4.3     2022-09-10 [1] CRAN (R 4.2.0)
##  lifecycle          1.0.3     2022-10-07 [1] CRAN (R 4.2.0)
##  listenv            0.9.0     2022-12-16 [2] CRAN (R 4.2.0)
##  lmtest             0.9-40    2022-03-21 [2] CRAN (R 4.2.0)
##  magrittr           2.0.3     2022-03-30 [2] CRAN (R 4.2.0)
##  MASS               7.3-59    2023-04-21 [1] CRAN (R 4.2.0)
##  Matrix             1.5-4     2023-04-04 [1] CRAN (R 4.2.0)
##  matrixStats        0.63.0    2022-11-18 [2] CRAN (R 4.2.0)
##  memoise            2.0.1     2021-11-26 [2] CRAN (R 4.2.0)
##  mime               0.12      2021-09-28 [2] CRAN (R 4.2.0)
##  miniUI             0.1.1.1   2018-05-18 [2] CRAN (R 4.2.0)
##  munsell            0.5.0     2018-06-12 [2] CRAN (R 4.2.0)
##  nlme               3.1-162   2023-01-31 [1] CRAN (R 4.2.0)
##  parallelly         1.35.0    2023-03-23 [1] CRAN (R 4.2.0)
##  patchwork        * 1.1.2     2022-08-19 [1] CRAN (R 4.2.0)
##  pbapply            1.7-0     2023-01-13 [1] CRAN (R 4.2.0)
##  pillar             1.9.0     2023-03-22 [1] CRAN (R 4.2.0)
##  pkgbuild           1.4.0     2022-11-27 [1] CRAN (R 4.2.0)
##  pkgconfig          2.0.3     2019-09-22 [2] CRAN (R 4.2.0)
##  pkgload            1.3.2     2022-11-16 [1] CRAN (R 4.2.0)
##  plotly             4.10.1    2022-11-07 [1] CRAN (R 4.2.0)
##  plyr               1.8.8     2022-11-11 [1] CRAN (R 4.2.0)
##  png                0.1-8     2022-11-29 [1] CRAN (R 4.2.0)
##  polyclip           1.10-4    2022-10-20 [1] CRAN (R 4.2.0)
##  prettyunits        1.1.1     2020-01-24 [2] CRAN (R 4.2.0)
##  processx           3.8.1     2023-04-18 [1] CRAN (R 4.2.0)
##  profvis            0.3.8     2023-05-02 [1] CRAN (R 4.2.0)
##  progressr          0.13.0    2023-01-10 [1] CRAN (R 4.2.0)
##  promises           1.2.0.1   2021-02-11 [2] CRAN (R 4.2.0)
##  ps                 1.7.5     2023-04-18 [1] CRAN (R 4.2.0)
##  purrr              1.0.1     2023-01-10 [1] CRAN (R 4.2.0)
##  R6                 2.5.1     2021-08-19 [2] CRAN (R 4.2.0)
##  RANN               2.6.1     2019-01-08 [2] CRAN (R 4.2.0)
##  RColorBrewer       1.1-3     2022-04-03 [2] CRAN (R 4.2.0)
##  Rcpp               1.0.10    2023-01-22 [1] CRAN (R 4.2.0)
##  RcppAnnoy          0.0.20    2022-10-27 [1] CRAN (R 4.2.0)
##  RcppRoll           0.3.0     2018-06-05 [2] CRAN (R 4.2.0)
##  RCurl              1.98-1.12 2023-03-27 [1] CRAN (R 4.2.0)
##  remotes            2.4.2     2021-11-30 [2] CRAN (R 4.2.0)
##  reshape2           1.4.4     2020-04-09 [2] CRAN (R 4.2.0)
##  reticulate         1.28      2023-01-27 [1] CRAN (R 4.2.0)
##  rlang              1.1.1     2023-04-28 [1] CRAN (R 4.2.0)
##  rmarkdown          2.22      2023-06-01 [1] CRAN (R 4.2.0)
##  ROCR               1.0-11    2020-05-02 [2] CRAN (R 4.2.0)
##  Rsamtools          2.14.0    2022-11-01 [1] Bioconductor
##  rstudioapi         0.14      2022-08-22 [1] CRAN (R 4.2.0)
##  Rtsne              0.16      2022-04-17 [2] CRAN (R 4.2.0)
##  S4Vectors          0.36.2    2023-02-26 [1] Bioconductor
##  sass               0.4.5     2023-01-24 [1] CRAN (R 4.2.0)
##  scales             1.2.1     2022-08-20 [1] CRAN (R 4.2.0)
##  scattermore        0.8       2022-02-14 [1] CRAN (R 4.2.0)
##  sctransform        0.3.5     2022-09-21 [1] CRAN (R 4.2.0)
##  sessioninfo        1.2.2     2021-12-06 [2] CRAN (R 4.2.0)
##  Seurat           * 4.3.0     2022-11-18 [1] CRAN (R 4.2.0)
##  SeuratObject     * 4.1.3     2022-11-07 [1] CRAN (R 4.2.0)
##  shiny              1.7.4     2022-12-15 [1] CRAN (R 4.2.0)
##  Signac           * 1.9.0     2022-12-08 [1] CRAN (R 4.2.0)
##  sp                 1.6-0     2023-01-19 [1] CRAN (R 4.2.0)
##  spatstat.data      3.0-1     2023-03-12 [1] CRAN (R 4.2.0)
##  spatstat.explore   3.1-0     2023-03-14 [1] CRAN (R 4.2.0)
##  spatstat.geom      3.1-0     2023-03-12 [1] CRAN (R 4.2.0)
##  spatstat.random    3.1-4     2023-03-13 [1] CRAN (R 4.2.0)
##  spatstat.sparse    3.0-1     2023-03-12 [1] CRAN (R 4.2.0)
##  spatstat.utils     3.0-2     2023-03-11 [1] CRAN (R 4.2.0)
##  stringi            1.7.12    2023-01-11 [1] CRAN (R 4.2.0)
##  stringr            1.5.0     2022-12-02 [1] CRAN (R 4.2.0)
##  survival           3.5-5     2023-03-12 [1] CRAN (R 4.2.0)
##  tensor             1.5       2012-05-05 [2] CRAN (R 4.2.0)
##  tibble             3.2.1     2023-03-20 [1] CRAN (R 4.2.0)
##  tidyr              1.3.0     2023-01-24 [1] CRAN (R 4.2.0)
##  tidyselect         1.2.0     2022-10-10 [1] CRAN (R 4.2.0)
##  urlchecker         1.0.1     2021-11-30 [1] CRAN (R 4.2.0)
##  usethis            2.1.6     2022-05-25 [1] CRAN (R 4.2.0)
##  utf8               1.2.3     2023-01-31 [1] CRAN (R 4.2.0)
##  uwot               0.1.14    2022-08-22 [1] CRAN (R 4.2.0)
##  vctrs              0.6.2     2023-04-19 [1] CRAN (R 4.2.0)
##  vipor              0.4.5     2017-03-22 [2] CRAN (R 4.2.0)
##  viridisLite        0.4.2     2023-05-02 [1] CRAN (R 4.2.0)
##  withr              2.5.0     2022-03-03 [2] CRAN (R 4.2.0)
##  xfun               0.39      2023-04-20 [1] CRAN (R 4.2.0)
##  xtable             1.8-4     2019-04-21 [2] CRAN (R 4.2.0)
##  XVector            0.38.0    2022-11-01 [1] Bioconductor
##  yaml               2.3.7     2023-01-23 [1] CRAN (R 4.2.0)
##  zlibbioc           1.44.0    2022-11-01 [1] Bioconductor
##  zoo                1.8-12    2023-04-13 [1] CRAN (R 4.2.0)
## 
##  [1] /gpfs/gibbs/project/ya-chi_ho/jac369/R/4.2
##  [2] /vast/palmer/apps/avx2/software/R/4.2.0-foss-2020b/lib64/R/library
## 
## ------------------------------------------------------------------------------